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ABSTRACT 

Comparison was made of the results of three analyses 
of teacher judgments concerning the selection of curriculum materials 
for the teaching of writing in elementary school. Twenty-five male 
and female fourth and fifth grade teachers, with teaching experience 
ranging from 4 to 33 years, responded to questions on their judgments 
of the value of language arts activities (described by short 
statements of purpose and a listing of the steps involved in planning 
and conducting the activity). The intent of the study was to 
determine the validity of three types of research used in exploring 
the judgment process: (1) policy capturing analysis; (2) process 
tracing analysis; and (3) analysis of teachers* self-reports of their 
judgment processes. Among conclusions reached were the following: (l) 
Teachers as judges may have better insight into their own decision 
processes than researchers usually give them credit for. Closer 
attention should be paid to differences in language and level of 
detail offered by the various methods and to what kind of data is 
used to evaluate the validity of verbal reports. (2) Better models of 
the tasks in which judgment is being examined should be developed. 

(3) Hore should be known about how experience influences judgment. 

(4) Hulti-method approaches will probably provide more accurate 
results. ( JD) 
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Abstract 

This report compares the results of three analyses of data on teachers* 
judgments concerning the selection of curriculum materials for the teaching < 
writing In elementary school. The three analyses are a policy capturing 
analysis, a process tracing analysis, and an analysis of teachers* self- 
reports of their judgment processes. The authors conclude that teachers as 
judges may have better Insight Into their own decision processes than re- 
searchers have given them credit for. They call for more attention In judg- 
ment research to modeling of the judgment task and hypothesizing about the 
differences In judgment heuristics used by novice and experienced teachers. 



SELF-REPORTS OF TEACHKR JUDGMENT 1 
Robert J. Yinger and Christopher M. Clark2 

Recently, it has been reasserted that teachers, like people in general* 
are unaware of how they use and weigh information to make Judgments (Shave lson 
and Stern, 1982). This claim is based on a comparison of teachers 1 self- 
reports of their judgment "policies" with mathematical models of these deci- 
sions generated in polf cy-capturing research* 

Within the research on human judgment there has been debate over whether 
these findings are due to a person's inability to properly perceive mental 
processes (e.g., Nisbett and Wilson, 19?7) or are due to the inherent charac- 
teristics of the linear models used in policy capturing (e.g., Dawes and 
Corrigan, 1974). To date, evidence has been most frequently collected by com- 
paring policy capturing models and self-reports on the basis of their predic- 
tive ability. This paper reports an effort to extend researchers knowledge 
about a judge's self- (or meta-cognitive) awareness of mental processes by 
comparing self-reports of teachers' Judgments about instructional materials 
with both a policy-capturing model and a more descriptive process-tracing 
ntodel. 

Policy capturing is a popular and frequently used method of studying and 
representing human judgment. This approach begins with a simple model 



*This paper was presented at the April, 1983 annual meeting of the 
American Educational Research Association in Montreal. 

^Robert Yinger is a former IRT senior researcher with the Teacher 
Planning Project and an associate professor of education at the University of 
Cincinnati. Chris Clark coordinated the IRT f s Teacher Planning Project and 
now co-coordinates the Written Literacy Project. He ie an associate professor 
in the Department of Counseling, Educational Psychology, and Special 
Education. 
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(usually linear) and attempts to reproduce the Inferential response of 3 
particular judge. Of central Interest In this paradigm Is how judges weigh 
and combine Information provided In the form of discernable cues or features 
of the objects to be judged. This approach has been used recently to study a 
number of aspects of teacher thinking Including teacher judgments about char- 
acteristics of effective teachers (Anderson, 1977), classroom organization 
(Borko, 1978), decisions about reading Instruction (Borko & Nlles, 1982), 
classroom management (Cone, 1978), Instructional strategies (Russo, 1978), and 
Instructional content (Floden, Porter, Schmidt, Freeman, & Schwllle, 1981). 

Process tracing methods of studying judgment take a very different 
approach to the problem of Investigating and representing thinking processes. 
Since Introspective reports of many judges seem to Indicate the presence of a 
complex, conflgural judgment process, process-tracing methods begin with a 
complex representation of the judgment In the form of verbal protocols and 
attempt to simplify the processes by representing the judgment In the form of 
decision trees, network representations, or flow diagrams. This approach has 
been used most widely In cognitive psychology (especially the study of problem 
solving) and has been only rarely applied to the study of teacher thinking. 

This study Is one part of a series Investigating teacher judgment during 
the evaluation of Instructional materials. This series Includes a study In- 
vestigating the factors Influencing the selection of Instructional activities 
(Clark, Ylnger, & Wlldfong, 1978), a policy-capturing study of teacher judg- 
ment (Yinger, Clark, & Mondol, 1981), a process tracing study of teacher judg- 
rent (Yinger & Clark, 1982), and an analysis of teachers* self-reported judg- 
ment processes (this paper). 

The underlying hypothesis of these studies is that the selection of 
attractive, appropriate, and effective instructional activities is an 



Important step In teacher planning for Instruction (Yinger, 1977)* We have 
argued elsewhere (Clark 6 Ylnger, 1977) that a greater number and variety of 
studies are required about teacher judgment of students, of curriculum materi- 
als, and of other Important aspects of the classroom environment before such 
research will be useful In policy and training decisions. This set of studies 
adds to the teacher judgment data by Investigating teacher thinking In realise 
tically complex situations* By applying various modeling methods to judgment 
situations like those regularly encountered by elementary school teachers, we 
are also evaluating the usefulness of these methods for describing the com- 
plexities and subtleties of teachers' mental lives* 

Method 

Subjects 

The subjects In this study were 25 fourth and fifth grade teachers from 
two Michigan school districts* Eight of the teachers were male and 17 were 
female* Their ages ranged from the mld-20s to the mid^30s* The average num- 
ber of years of teaching experience was 9 years with a range from 4 to 33 
years. Sixteen teachera taught in self-contained classrooms, while nine 
taught in team-teaching situations or a combination of team-teaching and de- 
partmental arrangements. Seven of the teachers taught in urban settings, 18 
in suburban communities, and one in a rural area* All of the teachers volun- 
teered for the study and were paid for their participation. 

Materials 

The materials for the study consisted of 32 one- or two-page descriptions 
of language-arts writing activities* These descriptions were derived from 
activities selected from a commercially available Instructional catalogue of 
language-arts activities for upper elementary classrooms (Forte, Frank, & 
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McKenzie* 1973)* The activity descriptions were all presented in the same 
general format consisting of an activity title, a one-* or two-sentence state- 
ment of the purpose of the activity t snd a listing of the steps involved in 
planning and conducting the activity* 

Each sctivity description was edited to reflect five dimensions found to 
be important in teachers' judgments of the quality of language-arts instruc- 
tional materials (Clark, Yinger, & Wildfong, 1978)* These dimensions, or 
cues > were 

1 * student involvement , 

2. difficulty for students, 

3* integration with other skills or subject matter, 
4* demand on teachers, and 

5. fit between stated purpose and described instructional process* 

Th*" 32 activity descriptions were constructed to represent a full factor- 
al matrix of high and low values for each cue* The manipulation and final 
assessment of each description was accomplished by means of independent rat- 
ings of each sctivity by four researchers, with negotiation between raters 
when disagreement occurred* 

Ve asked each participant to respond to four questions about each 
activity* On the back of each activity description the questions were stated 
along with a nine-point continuum to record each response* The four questions 
or judgments to be made about each activity were as follows: 

K How attrsctive is this activity to you? 

2* How appropriate is this activity as part of a catalogue of 
language-ar"* activities for fourth and fifth grade teachers? 

3* How likely would you be to use this activity as it is in your 
present classroom? 

4* How effective do you think this activity would be for your 
stude nts? 
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Procedure 

Having received an explanation of the purposes, procedures, and materials 
for the study and having completed a set of six warm-up activities, each 
teacher responded to the fcmr questions for each of the 32 activity descrip- 
tions. Nineteen of the teachers (the Policy-Capturing-Only Group) provided 
data only in the form of their rstings on the reverse side of each activity 
description. The remaining six teachers (the Process-Tracing Group) parti- 
cipated in individual sessions. These teachers, in addition to recording 
their ratings for each description, were asked to "think aloud" as they parti- 
cipated in the task. These verbalizations were tape recorded and later typed 
into protocols of the judgment task. At the conclusion of the Judgment ses- 
sions, all teachers were asked to respond to an instrument requesting them to 
report the factors influencing their judgments by distributing 100 points 
among the general categories: students, self as teacher, materials, and other 
(could be specified by the teacher). 

Data /dialysis 

Linear regression equations were computed for each of the four Judgments 
made by each participant. The five activity features were treated as indepen- 
dent variables onto which the ratings given to each case were regressed. The 
regression equations produced by this analysis yielded for each teacher a set 
of weightings representing his or her cue use for each of the four judgments. 
For the teachers in the Process-Tracing Group, models of the judgment process 
were constructed from the verbal protocols. 

Results 

This study was designed to provide data comparing teachers* self-reports 
of factors (cues) they considered when Judging instructional activities to 
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cues suggested by policy-capturing and process-tracing representations of 
Judgment. The typical approach to examining self-reports has been to compare 
the "subjective" weights that they generate to the weights generated by the 
mathematical models* We will use the cue usage and process Information from 
the procesa tracing to help Interpret the similarities and discrepancies be- 
tween the other two data sources* 

Self-Reports 

Self-reports were solicited from all 25 teachers participating In the 
study* We have separated the reports from the Pollcy-Capturing-Only Croup (N 
* 19) and the Process-Tracing Group (N a 6) for comparability* since we have 
all three sources of data for only a portion of the total group. 

As mentioned earlier* each teacher was asked at the end of the Judgment 
session to distribute a total of 100 points across the four categories of 
"students/* "self," "materials*" and "other" (which could be specified) for 
each of the four Judgment questions* The distributions reported by each 
teacher are shown In Table 1. 

Table 1 indicates that Students and Self were by far the most heavily 
weighted factors* garnering 82% of the grand total for the Policy-Capturlng- 
Only Group and 75% for the Process-Tracing Group. The only distinct differ- 
ence between the self-reports for the two groups was the tendency in the 
Process-Tracing Group to assign more weight to the "other" category* This 
difference is primarily due to relatively heavy weighting by two of the teach" 
ers in the Process-Tracing Group (T2I> T22) on the second Judgment (appropri- 
ateness). The factors entered by these two teachers were "the discipline of 
language arts" and "skills taught (by the activity).** 



Table 1 

Self-Reported Weightings for 25 Teachers Across 4 Judgments 
Judgment 



J I J2 J3 J4 Grand Total 



Teacher 


Stu 


Slf 


Mat 


0th* 


Stu 


Slf 


Mat 


Oth Stu 


Slf 


Mat 


Oth 


Stu 


Slf 


Mat 


Oth 
















Policy-Capturing Group (K 


- 19) 
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70 


20 


10 




100 






100 








100 








2 


50 


30 


20 




80 


10 


10 


50 


25 


25 




50 


50 






3 


20 


35 


20 


25 


30 


10 


20 


40 20 


30 


20 


30 


25 


10 


25 


40 


4 


50 


25 


25 




50 


10 


40 


75 


10 


15 




75 


10 


15 




5 


90 


10 






90 


10 




90 


10 






100 
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AO 


20 


40 




10 


25 


60 


5 50 


25 


25 




90 


5 


5 
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50 


30 


20 




75 


10 


15 


25 


50 


25 




35 


35 


30 
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50 


30 


20 




20 


60 


20 


60 


20 


20 




50 


20 


30 




9 


20 


60 


20 




50 


20 


30 


80 


10 


10 




70 


20 


10 




10 


50 


25 


25 


• 


50 


25 


25 


25 


5" 


25 




25 


50 


25 




11 


30 


50 


10 


10 


50 


40 


10 


4" 


30 


25 




40 


30 


15 


15 
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90 




10 




on 




l n 
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30 


20 


40 


20 


20 


20 40 


10 


20 


30 




5 


5 


10 
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10 


10 


l n 






^n 
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25 
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25 
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15 


15 
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10 


10 




16 


30 


55 
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50 
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30 


10 




90 


10 






17 


25 


50 


25 




75 


20 
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90 


10 






90 


10 






18 


30 


70 
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10 




30 


60 


10 




85 


10 


5 




19 


35 


65 
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40 


20 


30 
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10 




70 


25 


5 
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39 


42 
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58 


23 


16 


3 58 


24 


14 
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67 


18 


12 


3 
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90 
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60 
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20 
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40 


40 
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50 


50 
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25 50 


25 


25 




75 


25 






25 


30 


30 


40 




40 


20 


20 


20 25 


25 


25 


25 


30 


30 


20 


30 


X 


34 


51 


15 


0 


52 


11 


5 


32 42 


31 


16 


11 


67 


11 


16 


6 



56 26 15 



49 26 13 12 



q Note: Stu: students, Slf: self, Mat: materials, 0th: others 
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Policy-Capturing Models 

The results of the policy-capturing snalysis of the 19 teachers in the 
Policy-Capturing-Only Group is reported in detail in Yinger, Clark, and 
Mondol (1981), so will provide only an overall summary here. In general, 
the regression equations produced by the policy-capturing analysis did not 
prove to be good models of the judgment process* Forty-four percent of the 
equstions had no significant weights, snd overall, the five activity features 
that we manipulated accounted for less than one fourth of the variance in the 
teachers 1 judgments. The significant models were highly idiosyncratic with no 
discernable trends in cue use* 

The results from the six process-tracing teachers was very similar* Of 
the 24 policy equations generated, only 9 were statistically significant (p< 
•05)* Eight of the significant equations were from two teachers, and the 
average amount of total variance accounted for by the models (R2) was only .27 
(range - ,21 ~ .37). In other words, the models had virtuslly no descriptive 
power for four of the six ju ges, and when statistically significant, cold 
account for only small portions of the teachers' judgment behavior. 

Like the teachers in the Policy-Capturing-Only Group, the process-tracing 
teachers produced differing policies* The significsnt equations for this 
group are illustrated in Table 2. Three cues — Fit, Demand, and Involvement — 
appear most frequently. (Difficulty appears only twice.) In the table, the 
components of the models are ordered from most to least heavily weighted. 

Frocessing-T racing Results 

Ve hsve not yet analyzed the process-tracing protocols for all aix of the 
teachers in the Process-Tracing Group. The analysis of two teachers is re- 
ported in Yinger and Clark (1982). This analysis produced information about 
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Table 2 

Components of the Significant Regression Equations (p<*05> 

for Teschers In the Process-Tracing Group -! 

Teacher J udgment Components 

T21 J 4 Inv .21 

T22 Jl Fit + Dem + Mff .37 

J2 Fit .25 

J3 Fit + Dem .21 

J4 Fit + Dem .28 

T23 Jl Inv .23 

J2 Inv + Fit .26 

J3 Inv + Dem + Fit .27 

J4 Inv + Dem + Fit + Dlff .36 



cue use, processing strategies, and probably most Important for this analysis, 
a description of the way In which these two teachers transformed the judgment 
task. 

Basically, what the process-tracing analysis suggests Is that when con- 
fronted with an activity-judgment task of this type, thece teachers did not 
judge the actlvlty-as-glven, but rather transformed the activity Into a form 
that might actually work In his or her classroom—the actlvlty-to-be-used. 
This transformation may largely be due to the fact that the activity descrip- 
tions constitute plans, and plans Imply potential use In a specific context, 
tn fact, much of the mental transformation activities were "contextusllzatlon" 
operations, where the teacher was drawing on and Incorporating context- 
specific knowlega about students, environments, snd self. What Is eventually 
judged, then, Is not the activity presented on the page, but a modified 
activity In the mind of the judge. 
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Applying, the Three Sources of Data: A Case Study 

Teacher 22 Is a 4th grade teacher In a rural school with four years of 
teaching experience. Her self -reports, like those of the rest of the teach^ 
ers, emphasize the Students and Self Categories. There Is aorae variation 
across judgments: A majority of points are asaigned to Self In Judgments 1 
and 3 (Attractiveness, Likelihood of use); her entry In the "Other" category, 
"Skllla taught (by materials)," captures most of the points In Judgment 2 
(Appropriateness); and nearly all of the points were assigned to Students In 
Judgment 4 (Effectiveness). See Table 3 for a summary. 

An examination of the policy-capturing models of Teacher 22 Indicates a 
somewhat different emphasla. As summarized in Table 3, the four significant 
regresalon equations assign heaviest weight to the factor Fit (between stated 
purpose and described Instructional process), which was represented in every 
model and the only factor represented In the model for Judgment 2, Demand (on 
the teacher) was represented In three of the four models, and Difficulty (for 
students) was Incorporated once. 

By comparing the policy-capturing models to the self-reports from this 
teacher, one might conclude that she tends to overweight the attention she 
actually pays to student factors and underestimates the attention she places 
on materials. The R^s for the regression models account for, on the average, 
only abvi't' 25% of the total variance, so there are likely to be many other 
factors contributing to the judgments. 

The process-tracing analysis may contribute to our understanding of the 
factors considered by Teacher 22, alnce the protocols suggest the use of cer- 
tain cues. During the judgment task this teacher mentioned 22 different cues. 
Four of the five cues manipulated In the activity deacrlptlons were among 
those mentioned; Integration waa omitted. When judging a single activity, 
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Tabl» 3 

Teacher 22: Three Sources of Data 



Self -Reports 

Jl 32 J3 J4 

■ Students 20 30 30 90 

Self 60 60 10 

Materials 20 10 

Other 70* 

* "skills taught (by msterials)" 



Policy Models 

Jl Fit + Dem + Dlff 

J2 Fit 

J3 Fit + Dem 

J4 Fit + Den 



Frequency of Cue Use from Process Tracing 
Rank Order: 

1. Prerequisite instruction needed 

2. Students' task-related ability 

3. Fit of stated purpose with activity description 
4* Age-level appropriateness 

5. Fit with own goals 

6. Student interest 

7. Student enjoyment 
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this teacher used a moderate number of cues (mean ■ 6*81 }• Seven of the cues 
accounted for approximately 50% of cue use. These are listed in Table 3. 

Process-tracing analysis does not produce a metric comparable to the cue 
weightings derived from a regression model* We can, however, get a rough in- 
dication of the importance of various considerations from the number of times 
cues were used. (It is important to remember that frequency of use provides 
no clue as to the importance of a cue in a specific deliberation or how it was 
used in relation to the other cues considered at the same time.) 

An examination of Teacher 22 f s cue use from the proceas-tracing analysis 
aeems to complement and explain the data obtained from the policy models and 
the self -reports. The aeven most frequently used cues (in rank order) are 1) 
prerequisite instruction needed, 2) students' task-related ability, 3) fit of 
stated purpose with activity description, 4} age-level appropriateness, S) fit 
with teacher's own goals, 6) student interest, and 7} student enjoyment. 
Mindful of the caution stated in the previous paragraph, this information may 
help resolve some of the discrepancy between tije policy models and the self- 
reports • 

The cue-use data obtained from the process analysis supports the cue use 
suggested by the other two data sources. The fact that six of the seven 
process-tracing cuea involve deliberations about students and self (cues 
ranked 1,2,4,5,6,7) supports the teacher* a impression of heavy weighting of 
the Student and Self Categories of the self-reports* All aeven of the cuea 
imply a consideration of the materials at hand. The suggestion that the Judg- 
ment process ia preceded by extensive transformational and contextualization 
activities also supports an emphasis on students and self~-the participants in 
the preparation and implementation of the activities. 
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The information from the process snslysis slso supports the data from the 
policy-cspturing models. Fit, the cue weighted most heavily, vss the third 
most frequently used cue in the process descriptions* Demsnd on teacher, used 
in three of the regression models, wss slso emphasized in the process snsly- 
sis, sssuming that the need for prerequisite instruction implies a demand on 
the tescher to supply it* 

To summsrize, the three sources of dsta seem to provide complementary, 
though somewhst different, informstion sbout this tescher's judgment. The 
process model seems to 'be the richest source of dsta, because it is not re- 
stricted to five cues determined to be important beforehand. The self-reports 
seem to be in genersl agreement with the cue use dsta frota the process de- 
scriptions* The policy-capturing models provide dats about weighting thst is 
net supplied by the process snslysis* For instsnce, Fit wss not the most 
frequently used, but when used, it msy hsve been hesvily weighted. Finslly, 
the process-trscing snslysis provides informstion sbout the tssk thst enables 
us to better understsnd the poor showing of the policy-capturing models and to 
interpret the self -reports. 



This study hss provided sn opportunity to exsmine tescher judgment from 
multiple perspectives. Surprisingly, there sre few examples of this kind of 
compsrison in the judgment litersture. (Einhorn, Kleinmutz, & Kleinmutz, 
1979, snd Yinger, 1975, sre exceptions.) As a result, resesrehers know little 
sbout the comparative strengths of the vsrious methods and the relstive suit- 
sbility of the methods for yielding certsin kinds of informstion. Our intent 
in initisting the set of studies, of which this study is one part, wss to com- 
psre policy-cspturing snd process-trscing methods "hesd-to-hesd M to determine 
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which was better. We have found, like others who have attempted this (see 
especially Elnhorn et a!., 1979) that this was a naive approach to the problem 
and that various methods provide unique and complementary results. 

Method Characteristics 

Policy capturing . These methods have been shown to provide high predic- 
tive power In a wide variety of judgment settings (see Slovlc & Llchtensteln, 
1971; and Slovlc, Flschhoff, & Llchtensteln, 1977). It has also yielded high 
explanatory results as measured by goodness-of-flt criteria such as R2 values. 
There Is some debate, however, about whether the success of modeling human 
judgment with linear models Is due to the linearity of judged* behavior or due 
to characteristics of linear models. 

Dawes and Corrlgan (1974), for example, made the observation that linear 
models have typically been applied In situations where the predictor variables 
are monotonlcally related to the criterion and where there Is error In the 
Independent and dependent variables. They showed that these conditions Insure 
good fits by linear models, regardless of whether or not the weights In such 
models are optimal. 

Self-reports . The degree to which a Judge Is (or can be) aware of the 
weight attached to his or her Judgments has been the subject of considerable 
debate (see, for a review of this controversy, Slovlc & Llchtensteln, 1971, 
and Slovlc et al. 1977). Mare recently, the use of verbal reports as data has 
come under considerable criticism (Nlsbett & Wilson, 1977). These criticisms 
have been convincingly rebutted by others (Ericsson & Simon, 1980). 

Erlccson and Simon (1980) have developed an Information-processing theory 
that specifies conditions under which verbal reports will be most reliable. 
Simply put, they argue that verbal reports will be most valid and reliable 
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when a subject is asked to report on the contents of short term memory—that 
which he or she is currently attending to* Inconsistent introspective reports 
will be more likely in response to probes that are too general to elicit the 
information actually sought or as a result of requests that require subjects 
to uae inferential processes to fill out and generalize incomplete or missing 
memories (p* 247)* 

The type of self-report requested in this study is not of the level of 
reliability of in-process reports like "thinking aloud* 11 It is a retrospec- 
tive and somewhat generalized verbal report of the type that Ericsson and 
Simon would call a "general report*" The fact that the self-reports were col- 
lected immediately after the task and requested information that waa certainly 
a part of the judges* deliberations provide some support for the credibility 
of these verbal reports* At the most, the self-reports give some insight into 
the actual weightings incorporated by the teachers* At the least, they pro- 
vide indications of the teachers 1 implicit theories about the various ways 
certain kinds of information are considered in making instructional decisions* 

Process tracing * Process tracing has produced detailed models of problem 
solving and judgment in a variety of situations, both in the laboratory and in 
natural aettings (see Shulman & Elstein, 1975)* Recently research has demon- 
strated that process analysis can provide information about judgment that is 
unavailable through mathematical models* Einhorn et al* (1979) provide an 
anecdote demonstrating this point from a study of judgments of the nutritional 
quality of breakfast cereals* 

In the protocol, the judge uses the cue "calories" many time, yet, 
the cue receives no significant weight in the regression equation 
even though it is not very highly correlated with other cues* When 
one examines the variance of the cue in the sample of cereals used, 
the discrepancy becomea clear; namely, calories has a small variance 
and so it cannot receive a high weight in the regression equation* 
In contrast, the protocol indicates that the subject was paying at- 
tention to this cue* Whether the subject was reslly using this cue 
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is problematic, since our results indicate that it could not have 
been of much help in discriminating between the brands. (In the ex- 
treme case, a cue with no variance should be irrelevant.) In any 
case, the process model clearly shows that one can attend to cues, 
and thus feel that one has used them, without such cues receiving 
significant weights in the regression analysis (p* 482), 

Further support for the role of process tracing in interpreting the re- 
sults of other data has been offered by Einhorn and Hogarth (1981), 

The use of weights in models as reflecting differential cue impor- 
tance ignores the importance of attention in subjective weight 
estimates* , , Correspondence between subjective and statistical 
weights requires that people attend to and evaluate cues and that 
cues contain both variance and low intercorrelations. Disagreement 
between subjective and statistical weights can thus occur for three 
reasons: 1) people indeed lack Insight; 2) people ate end to, but 
cannot use, cues that lack variance (Einhorn et al,, 1979); 3) cues 
to which attention is not paid are correlated with others such that 
the nonattended cues receive inappropriate statistical weights* 
Both process -tracing methods and statistical modeling are necessary 
to untangle these competing interpretations, (pp. 62-63) 

Multi-method approaches . We join others in advocating the importance of 
approaching the study of human judgment using a variety of methods. Wo one 
method provides all the necessary information* As Einhorn et al, (1979) put 
it, "some process modelers may not be seeing the forest for the trees while 
some statistical modelers may not see any trees in the forest" (p, 483). 
While various approaches treat the underlying process at different levels of 
detail, each method can provide important data. Process analysis provides in- 
formation about the judgment process and cue use* Policy-capturing models can 
provide information about the relative emphasis put on various cues* Self 
reports may provide confirmation of the information provided by the other 
methods, and depending upon the way in which probes are directed, reflect 
actual processes or the judge *s beliefs about how these decisions should be 
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Factors Influencing Judgment 

The tagk * Cognitive approaches to understanding human problem solving 
and decision making have characterized thinking as being adaptive to the 
situation at hand* Similiarly, researchers have found that the behavior of 
the person solving a problem or making a decision tells us as much or more 
about the structure of the task as about the unique characteristics of the 
person involved (shulman & Elstein, 1975, p. 14). As a result, researchers 
who study decision making have increasingly turned their attention to trying 
to better understand and model task effects (see Einhorn & Hogarth, 1981}* 

TVo factors related to task effects have become especially salient in the 
research presented in this paper* First, the nature of the task seems to 
strongly interact with the assumptions and suitability of the methods used to 
model Judgment behavior* For example, policy^capturing models have proven to 
be most effective in situations where the objects to be Judged are self" 
contained in terms of the information needed by the Judge and where the 
Judgment task requires little or no manipulation of the files prior to 
Judgment* Examples of these as-given tasks include Judging an admission file, 
and MMPI^ profile, or a simulated student profile* These kinds of Judgment 
tasks do not require that the Judge use or plan to use the items in any real 
way* In contrast, Judgment tasks such as those presented in this study imply 
a to~be~used criterion* 

The activity descriptions used in this study were, in effect, plans for 
action—descriptions of what was to be done by a class* The might-be-done 
aspect seemed to require transformation of the activities, primarily by plac- 
ing tham in the context of the individual teacher and his or her students* As 
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a result of this mental manipulation, much additional information is brought 
to the activity descriptions and the teachers seemed to be Judging not the 
activities-as-given but the activities-as-imagined-in-use (see Yinger & Clark, 
1982 for more detail). 

Therefore, the failure of the policy-cipturing models msy be partly a 
function of the task framework. An examination of the judgment literature in- 
dicates that mathematical models seem to be most successful in judgment situa- 
tions that are self-contained — situations where objects may be judged solely 
as given. Studies that have been less successful in representing judgment 
with these models have typically used more complex stimuli (e*g., written 
descriptiona vs. numerical profiles) and have implied an in-use criterion (see 
for example, Borko, 1978; Cone, 1978; Russo, 1978; and Floden et al., 1981). 

The second consideration related to task is the apparent trade-off 
between control and representativeness of the judgment task. As mentioned 
above, mathematical models seem to be best suited to laboratory tasks that can 
be simplified and controlled. As tasks more closely resemble real-life judg- 
ments, they become more difficult to model using these methods. Einhorn and 
Hogarth (1981) in discussing this problem refer to the work of Ebbesen and 
Konecni (1980) who have studied several judgment tasks in both laboratory and 
natural settings (for example, setting of bail and driving a car) and have 
found major differences in results. Einhorn and Hogarth cite Ebbesen and 
Konecni * s conclusions : 

There is considerable evidence to suggest that the external validity 
of decision making research that relies on laboratory simulations of 
real-vorld deciaion problems is low. Seemingly insignificant fea- 
tures of the decision task and measures cause people to alter their 
decision strategies. The context in which the decision problem is 
presented, the salience of alternatives, the number of cues, the 
concreteness of the information, the order of presentation, the 
similarity of cue to alternative, the nature of the decomposition, 
the form of the measures, and so on, seem to affect the decisions 
that subjects make. (Einhorn & Hogarth, 1981, p. 81) 
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Experience * Among cognitive researchers, Che nature and Che effects of 
experience have become a popular topic of research and theory* Much of this 
work has been devoted to the differences between novices and experts in vari- 
ous task environments and the factors contributing to an eventual shift c'rom 
novice to expert* 

There are two findings from this research that suggest that the experi- 
ence level of a judge should be taken into consideration in interpreting the 
results of various modeling methods* First, research indicates that experts 
are more likely than novices to recognize (perceive/understand) and represent 
problems using large-scale functional units (e.g., schemas, scripts, routines) 
that focus on the crucial underlying structure and components of the problem 
(de Groot, 1965; Hlnsley, Hayes, & Simon, 1977; Larkln, 1979; Newell & Simon, 
1972). The reliance on these large units of knowledge ard skill suggest that 
Judgment tasks may activate large pieces of knowledge and experience that 
Immediately become part of the Judgment task. Researchers have also found 
that experts are more likely than novices to mentally simulate action prior to 
its execution by means of incorporating complex and detailed representations 
of action within a particular environment (de Groot, 1965; Jeffries, 1982; 
Larkin, 1979). 

These findings suggest that the more experience a person has, the more 
likely he or she is to embellish and transform the information provided in the 
Judgment task. Modeling method* such as policy capturing that assume knowl- 
edge and control of the content of the objects being Judged may be less accur- 
ate for experienced Judges. In comparison, we would hypothesise that novice 
Judges, having less stored knowledge and experience to draw upon and Incorpor- 
ate into their Judgments, would rely more exclusively on the Information 
presented In the task. In this later case, mathematical models would be ex- 
pected to have a better fit* 
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A study by Slovlc, Fleissner, and Bowman (1972) provides results that are 
consistent with these predictions* In a study of Investment decision making, 
they found a negative correlation (-.43) between years of experience and self- 
lnsighn. Self-Insight was calculated by correlating a broker*s subjective 
weights (self-reported) with his calculated effects (mathematical model). In 
other words, the more experienced the broker, the less agreement there was 
between his or her self-reported policy and that generated by the policy cap- 
turing. 

Slovlc and his colleagues Interpreted these results as possibly suggest- 
ing that "the most experienced analysts produce verbal rationales for their 
evaluations that are less trustworthy than those of l heir Inexperienced col- 
leagues" (p. 300). Based on our discussion above, we would Interpret these 
results as suggesting the Inability of the mathematical model to represent 
accurately what the judge Is actually doing. 



Based on this study and the research presented In the discussion, we 
offer four conclusions as hypotheses to be considered In future research on 
teacher judgment. 

First, we think that judges have better insight into their own decision 
processes than researchers have typically given them credit for. Researchers 
need to pay closer attention to the differences In language and level of de- 
tail offered by various methods. Researchers also need to evaluate carefully 
wha* kind of data will be used as a criterion to evaluate the validity of 
verbal reports. 

Second, the form and complexity of the judgment task must be conslt ?red 
In evaluating the results of various modeling methods. Researchers need to 
develop bet * models of the tasks In which judgment Is being examined. 
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Third, the more experienced the judge, the more l ikely he or she Is to be 
judging elaborated and transformed mental models rather than the objects~as~ 
given. Researchers need to know more about how experience Influences judg- 
ment* 

Fourth, one method of modeling judgment la not better than the others for 
all purposes* The three sources of data used in this study each provided dif- 
ferent, though complementary Information. Accurate descriptions of teacher 
judgment will be more likely If multi-method approaches are employed by re- 
searchers • 
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